Who wrote that? 
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A brief overview of modern forensic linguistics methods for determining authorship. 

The following article tries to give an overview from a non-technical perspective and to make a 
corresponding evaluation. There are some academic publications on this topic that could be evaluated for a 
better assessment. However, my main purpose here is just to raise the issue, not to provide a sound and 
conclusive view so if you know anything more, publish it! 

Avoiding traces that could be your undoing down the road — perhaps even after years or decades — 
is probably of interest to most people who occasionally commit a crime and come into conflict with the 
law. Avoiding fingerprints, avoiding DNA traces, avoiding shoe prints and textile fiber traces or at least 
disposing of clothing afterwards, avoiding surveillance cameras, avoiding tool traces, avoiding recordings 
of any kind, recognizing surveillance, etc. — all this should be a concern for anyone who commits crimes 
from time to time and wants to protect themselves from identification. But what about those traces that 
often arise only after a crime has been committed, out of the urge to explain one’s deed anonymously or 
even by using a recurring pseudonym? When writing and publishing a communiqué? 

My impression is that in many cases no special attention is paid to these traces despite a rapid 
technological development of analytical capacities. This may be intentional, negligent, or a compromise 
of competing needs. Without wishing to make a general suggestion here on how to deal with these 
traces — after all, everyone must determine that for themselves — I would like to outline the methods the 
investigative authorities in Germany and elsewhere are currently (probably) working with, what seems 
possible in theory, and what could become possible in the future. 

Perhaps I should note in advance that everything or at least most of what I present here is scientifically 
as well as legally controversial. I am also less interested in the legal validity of linguistic analyses — and not 
in the scientific one either — than in whether it seems plausible that these investigations could guide a 
surveillance effort, because even if a trail is not useful in court by itself, it could still lead to other, useful 
trails. 


Author Identification at the BKA [Federal Criminal Police Office of Ger- 
many] 


According to its own information, the Federal Criminal Police Office (BKA) maintains a department 
dedicated to identifying the authors of texts. The focus is on texts related to criminal acts, such as 
responsibility claims, but also “position papers” from the “left-wing extremist spectrum,” among others. 
All collected texts are processed by linguistic studies in a so-called collection of communiques and can be 
compared and searched with the Criminal Information System for Texts (KISTE). According to the BKA, 
the texts are classified according to the following biographical characteristics of their (alleged) authors: 
origin, age, education and occupation. 

All incoming texts are also compared with previously saved texts to determine whether several texts 
may have been written by the same author. 

In the context of case-specific investigations, the stored texts can also be compared with texts whose 
authorship is known in order to determine whether they were written by the same author or whether this 
can be ruled out. 

This is the official information from the BKA about this department. What does this mean in practice? 

I think that one can assume that at least all responsibility claims are recorded in this database and 
analyzed to see whether there are other responsibility claims by the same author(s). The finding that they 
also record “position papers” allows us to draw further conclusions: at the very least, it seems possible that 
in addition to texts with criminal relevance, they also store other texts that are thought to come from a 
particular scene. For example, texts from newspapers, statements from political groups/organizations, calls, 
blog posts, etc. In the worst case, I would assume that all published texts on known “left-wing extremist” 
websites (after all, it is quite easy to get hold of them), as well as texts from print publications that appear 
interesting to the investigating authorities, would be fed into this database. 

This would mean that for each responsibility claim, the BKA would have a cluster of texts that they 


presume to have the same author. These can consist of other claims as well as texts that have been fed 
into the database. In addition to series of crimes, further clues to perpetrators can be obtained, such as 
pseudonyms, group names — or, in the worst case, names — under which an author of a claim may have 
written other texts, but also, depending on the text, all kinds of other information that it provides, often 
including clues to a person’s place of residence and activity, thematic focus, biographical characteristics, 
educational background, etc. All of this information can at the very least be used to narrow down the 
circle of suspects. 

What remains unclear in all of this is what other comparison samples the BKA might obtain. For 
most people, there is certainly a whole series of texts to which investigating authorities (could) have access 
and which could be fed into the database in the event of suspicion or possibly also partly as a precaution 
— if a person is on file with an entry such as “violent left-wing extremist”, etc. This could be anything 
with your name under it, from a letter to an authority to a letter to the editor in the newspaper. I will 
intentionally name only the most obvious sources here, so as not to inadvertently provide the investigating 
authorities with decisive inspiration, but I’m sure you can answer for yourself which texts of yours might 
be accessible. If the profilers of the BKA succeed in narrowing down the circle of suspects to a specific 
characteristic, which allows the comparison with masses of available text samples (for example, if it is 
assumed that a scientist of a certain discipline is responsible for a letter, all publications in this field could 
be used as comparison samples). This would, for example, be a possible (partial) explanation for how it 
might have gone with Andrej Holm in the case against the militante gruppe (mg), at least if one assumes 
that the BKA did not just Google “gentrification”, so I think it is quite possible that such analyses are also 
carried out. 


Methods of author recognition and author profiling. 


All this, however, only considers what the BKA claims to be able to do and takes these considerations to 
some logical conclusions. But how does author recognition or author profiling actually work? 

Who hasn't felt the fear that maybe the German teacher will expose you after a mocking poem about 
a teacher appeared in the washrooms and the whole school is making fun of how only you could have 
written “vacuum” [Leerer ] instead of “teacher” [Lehrer]. Fortunately, the entire German faculty fell for it, 
adopting the narrative of a spelling mistake and turning a blind eye to the all-too-accurate pun. Forensic 
linguistics does seem to require a bit of practice, or at least a criminological motivation, who knows. In any 
case, error analysis, which most have probably heard of, was one of the BKA’s most important analysis 
tools around 2002 along with style analysis, according to a promotional article by language cop Christa 
Baldauf. Spelling mistakes, grammatical errors, punctuation, but also typos, new or old spelling, hints on 
keyboard peculiarities, etc., all this serves the language cops to collect clues about the author. For example, 
if I write “muf” instead of “muss”, that could be a clue that I missed some of the more recent spelling 
reforms when I was in school. If, on the other hand, I constantly write terms that, according to spelling 
rules, use “8” and not “ss”, it could mean that there is no “@” on my keyboard. For example, if I speak of 
“dem Butter” [rather than “die Butter” ], it could be a reference to the fact that I grew up in Bavaria, etc. But 
I could also be faking all these things just to mislead the language cops. ‘The plausibility of my error profile, 
is also part of such an analysis. Similarly, stylistic analysis examines peculiarities of my writing style. What 
kind of terms do I use, does my sentence structure show specific patterns, are there repeated constellations 
of terms that may even appear in different texts, etc.? I think everyone who takes a closer look at his or her 
texts will recognize some stylistic characteristics of their own. 

Such qualitative analyses primarily serves to profile the authors. While it is certainly possible to 
match different texts in this way, the real value of such analyses lies in being able to determine things like 
age, “level of education’, “scene affiliation”, regional origins, and sometimes perhaps even indications of 
occupation/training, etc. Attempts to determine things like gender are also heard of, but generally do not 
seem to be quite as straightforward. 

In contrast, there are also more quantitative and statistical analyses that examine everything from word 


frequencies to word constellations to syntax sentence structure that can be measured in this way. These 
methods, known as stylometry, are sometimes very controversial because it is not possible to say exactly 
what they are meant to measure, but they sometimes deliver astonishing results, especially in combination 
with machine learning approaches. I think that these approaches are therefore likely to be used primarily 
to cluster different texts according to their similarities. 

‘The clear advantage of such quantitative analyses is that they can be performed en masse. All digitally 
available or digitizable texts can be analyzed in this way. From social media posts to books, texts can be 
captured using these methods. Although the success of these methods is currently still relatively modest, 
and it has often turned out that supposedly similar texts are often more similar in their genre than in 
their authorship, if one assumes that individual writing styles could certainly leave behind quantitative 
patterns, this means that once these patterns are known, a mass assignment of texts to certain authors will 
be possible. 


And now what? 


‘There were and are, of course, various approaches to dealing with this knowledge, one not better or worse 
than another. Those who do not write communiqués anyway largely avoid this problem, but are still affected 
by the problem of participation in publications and authorship of other texts. Whoever obscures texts 
before publication, for example, by having several people successively rewrite and rephrase passages from 
them, etc., runs the risk of also developing exploitable linguistic and stylistic characteristics in repeatedly 
similar constellations or also of failing to successfully conceal characteristics. Whoever thinks that they can 
dismiss the whole thing because none of their text samples are available or also because they are convinced 
that the legal value of author recognition is too shaky, risks that in the future text samples might somehow 
be available (for example because they are successfully convicted of authorship) or the legal assessment of 
the procedure changes. Those who trust that technology is not (yet) good enough may be surprised by 
future developments. Those who use technical solutions to obscure their authorship run the risk of leaving 
new characteristics and traces, and also of producing poorly written communiqués that no one wants to 
read anyway. If you never write any texts regardless, you just don't write any texts. 

So do whatever appeals to you most, but do it from now on — if you haven't already — keeping these 
traces in mind and the queasy feeling in your stomach, which is said to have saved many a person from 
making a careless mistake at the crucial moment. 
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